Prototype batching lock intent for skip locked#31344
Draft
emilienoel wants to merge 1 commit into
Draft
Conversation
|
|
✅ Deploy Preview for infallible-bardeen-164bc9 ready!Built without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify project configuration. |
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Batched
SKIP LOCKEDWalkthroughThis document walks through the rebased change and identifies whether each step happens in the YSQL layer or on the tserver side.
Definitions:
1. Enable batching via GUC
Where: YSQL
Files:
Adds:
yb_skip_locked_batch_sizeDefault:
32Meaning:
1disables the optimization.>1allows YSQL executor to prefetch multiple candidate rows forSKIP LOCKED.2. Detect eligible
SKIP LOCKEDqueryWhere: YSQL
File:
In
ExecLockRows, the batch path is used only when:So YSQL decides whether to use the optimization.
This means the query must be a YB relation with one row mark and
SKIP LOCKED.3. Prefetch candidate rows from the child plan
Where: YSQL
File:
Function:
ExecLockRowsBatchSkipLocked(...)YSQL pulls up to
yb_skip_locked_batch_sizerows from the child plan:For each candidate row, YSQL extracts the
ybctid:Then stores:
At this point, YSQL has a local batch of candidate rows.
4. Send candidate
ybctids to PgGateWhere: YSQL
File:
Function:
YBCLockTupleBatch(...)YSQL creates a new select statement:
Then it adds each candidate
ybctid:This is still YSQL-side code preparing the request.
5. PgGate appends batch arguments to the read request
Where: YSQL / PgGate client side
Files:
Call chain:
YBCPgDmlAddBatchYbctidArg(...)calls:
PgApiImpl::DmlAddBatchYbctidArg(...)which calls:
PgDmlRead::AddBatchYbctidArg(...)That appends each candidate to:
read_req_->add_batch_arguments();So the request sent to the tserver contains:
This is PgGate building the tserver request, but it is on the YSQL side of the boundary.
6. Execute/fetch the YSQL statement, causing RPC to tserver
Where: starts in YSQL, crosses to tserver
File:
YSQL calls:
The fetch is what forces the PgGate operation to perform the remote read RPC.
Boundary:
7. tserver detects special batch
SKIP LOCKEDrequestWhere: tserver
File:
In
ReadQuery::DoPerform, tserver checks:has_row_mark && !serializable_isolation && req_->pgsql_batch_size() == 1 && pgsql_read.wait_policy() == WAIT_SKIP && pgsql_read.batch_arguments_size() > 1If true, tserver chooses the new path:
So the tserver, not YSQL, decides how to process the batched lock request internally.
8. tserver tries to lock one batch argument at a time
Where: tserver
File:
Function:
ReadQuery::TryLockBatchArg(...)For each candidate index, tserver builds a write operation that creates read-lock intents for only that candidate.
It calls:
tablet_ptr->CreateReadIntentForBatchArg( isolation_level, pgsql_read, batch_arg_index, &write_batch);Then it submits:
peer->WriteAsync(std::move(query));This is tserver-side async write/conflict resolution.
9. Tablet builds intents for exactly one candidate
Where: tserver
Files:
Call chain:
Tablet::CreateReadIntentForBatchArg(...)calls:
docdb::GetIntentsForBatchArg(...)That function picks one candidate:
and creates lock intents only for that candidate.
This is important: tserver does not lock all batch arguments at once.
10. tserver handles lock success or conflict
Where: tserver
File:
In the callback from
WriteAsync:If the lock succeeds
self->first_locked_batch_arg_index_ = batch_arg_index; peer->Enqueue(self.get());The tserver records the winning index and proceeds to read the row.
If the lock conflicts / transaction error occurs
TransactionError(status).value() != TransactionErrorCode::kNoneThen because this is
SKIP LOCKED, tserver skips this candidate and tries the next one:If all candidates conflict
Eventually:
Then tserver sets:
first_locked_batch_arg_index_ = -1;and proceeds to read phase with no winner.
11. tserver reads only the winning candidate
Where: tserver
File:
Function:
ReadQuery::DoReadImpl()The original request still contains all candidates:
But after locking, tserver creates a modified effective request.
If there is a winner:
So the actual read phase reads only the winning row.
If no candidate was locked:
modified_req->clear_batch_arguments(); effective_req = modified_req;So the read returns zero rows.
12. tserver populates response metadata
Where: tserver
File:
After reading, tserver sets:
result.response->set_batch_arg_count(pgsql_read_req.batch_arguments_size());That tells PgGate:
If there was a winner, tserver also sets:
result.response->set_first_locked_batch_arg_index(first_locked_batch_arg_index_);This field was added in:
as:
So the tserver response carries the winning candidate index back to YSQL.
Boundary:
13. PgGate reads the winner index from response
Where: YSQL / PgGate client side
Files:
Call chain:
YBCPgDmlGetFirstLockedBatchArgIndex(...)calls:
PgApiImpl::DmlGetFirstLockedBatchArgIndex(...)then:
PgDmlRead::GetFirstLockedBatchArgIndex()then:
PgDocReadOp::GetFirstLockedBatchArgIndex()which reads:
resp->first_locked_batch_arg_index()If absent, it returns:
-114. YSQL maps the winner index to a PostgreSQL lock result
Where: YSQL
File:
Back in:
YBCLockTupleBatch(...)YSQL gets:
winnerThen:
So YSQL converts tserver's response into PostgreSQL executor semantics:
TM_OkTM_WouldBlock15. YSQL returns the winning tuple
Where: YSQL
File:
Back in:
ExecLockRowsBatchSkipLocked(...)If the batch lock returned
TM_Ok, YSQL restores the winning tuple into the result slot:Then it returns that tuple to the upper executor nodes.
So if the query was:
this is the point where the selected row goes back up the executor tree.
16. YSQL saves candidates after the winner as leftovers
Where: YSQL
File:
Suppose the batch was:
and tserver locked:
Then:
row0was tried and skipped.row1is returned.row2,row3were prefetched but not tried.YSQL stores candidates after the winner:
These are used on later calls to
ExecLockRows.17. YSQL tries leftovers before scanning more rows
Where: YSQL
File:
At the top of
ExecLockRows, before fetching a new row from the child plan, YSQL checks:If leftovers exist, it calls:
ExecLockRowsTryLeftover(...)That tries each leftover using the existing single-row path:
YBCLockTuple(...)The leftover entries themselves do not store a table id or relation key. They only store parallel
arrays of:
The table is recovered from the same
LockRowsStaterow mark:and the lock call uses:
This is safe because the batch optimization is only enabled when:
So every prefetched candidate and every leftover belongs to the single row-marked YB relation for
this
LockRowsState. For multi-row-mark queries, such as joins with multiple locked tables, thebatch path is not used because a bare leftover
ybctidwould not be enough to identify whichrelation to lock.
So after the first batch winner, later prefetched-but-untried rows are not lost.
18. YSQL cleans up leftovers
Where: YSQL
File:
In:
ExecEndLockRows(...)YSQL frees any unconsumed leftover tuples and ybctids.
Compact end-to-end sequence
yb_skip_locked_batch_sizecontrols batch sizeExecLockRowsdetects eligibleFOR UPDATE SKIP LOCKEDYBCLockTupleBatchbuilds a select requestbatch_argumentsReadQuery::DoPerformdetects batchedSKIP LOCKEDTryLockBatchArgtries candidate 0first_locked_batch_arg_indexYBCLockTupleBatchmaps winner toTM_Ok/TM_WouldBlock